THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


THE OFFICIAL JOURNAL OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


EDITORIAL COMMITTEE 


H. C. CARVER 
A. L. O’TOOLE 
T. E. RAIFORD 


Volume VIII, Number 2 
June, 1937 


PUBLISHED QUARTERLY 
ANN ARBOR, MICHIGAN 


el 
= 


Ph bs 


PAT TY 





The Annals is not copyrighted: any articles or tables appearing therein may 
be reproduced in whole or in part at any time if accompanied by 
the proper reference to this publication 


Four Dollars per annum 


Back numbers available at the following prices: 
Vols. I-IV $5 each. Single numbers $1.50 
Vol. V to date $4 each. Single numbers $1.25 


Made in United States of America 


Address: ANNALS OF MATHEMATICAL STATISTICS 
Post Office Box 171, Ann Arbor, Michigan 


Office of the Institute of Mathematical Statistics: 
Secretary: ALLEN T. Craic, University of Iowa 
Iowa City, Iowa 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BautimoreE, Mp. 

















ss 





= 





REGRESSION AND CORRELATION EVALUATED BY 
A METHOD OF PARTIAL SUMS 


By Fre.rx Bernstein 


‘*To be sure, Laplace viewed the matter in a similar way but he selected the 
absolute value of the error as a measure of loss. But if we mistake not, this 
position is certainly not less arbitrary than our own; that is to say, whether the 
double error is to be considered just as tolerable as, or worse than, the simple 
error twice repeated and whether it is thus more fitting to ascribe to the double 
error only a double weight, or a greater one, is a question which is neither in 
itself clear nor determinable by mathematical proof but has to be left entirely 
to individual discretion. 

‘‘Furthermore, it cannot be denied that the assumption under discussion 
violates the principle of continuity and precisely for this reason the procedure 
based on it strongly defies analytic treatment while the results to which our 
principle leads have the advantage of simplicity as well as of generality.’’— 
F. G. Gauss: Theoria combinationis observationum, pars prior, art. 6. 


Since the ‘“Theoria Combinationis” of C. F. Gauss appeared in the year 1821 
a century of Mathematical Statistics has been dominated by the ideas of this 
classical treatise—ideas whose fertility does not seem to be exhausted even 


today. 

The germ of most modern contributions to mathematical statistics—in fact 
also those of Karl Pearson and his school—go back decidedly to this paper. 
Though the immediate achievements of Gauss are so conspicuous as not to 
need any comment, a true critical appreciation of the work can be gained only 
by comparing it with the previous methods of Laplace, superseded by those of 
Gauss. 

For such critical appreciation, C. F. Gauss himself has prepared the ground 
in the lines quoted at the beginning of this article. To Gauss the standard 
deviation is a measure of uncertainty or risk of a game in which the errors of 
observation are considered as causing only losses. In this he follows the lead 
of his great predecessor. The difference between them is that Gauss adopts 
the square of the error as a measure of the loss while Laplace adopts its absolute 
value for this purpose. Either choice frees the error from its sign so that the 
loss is the same regardless of the sign of the error. 

Gauss considers this choice of the measure of the loss as purely conventional. 
Therefore he feels justified in adopting the square of the error because in adopt- 
ing the square instead of the absolute value of the error, the mathematics he 
uses remains in the easily accessible domain of analytical processes. This 
creates for these methods a superiority in elegance, simplicity, and generality. 

The modern developments of mathematical statistics, based on the principles 

77 





78 FELIX BERNSTEIN 


of Gauss, have confirmed the correctness of this viewpoint. This has proved 
true particularly in the theory of analysis of variance developed by R. A. Fisher 
and in the more general theory of semi-invariants, first defined by N. H. Thiele. 

The inadequacy of the Gaussian method seriously impairing its value for 
statistical use has come to light through the investigations of Karl Pearson of 
distributions of one and two variables. Since the moments of higher order 
involve standard deviations of increasing magnitude the characterization of the 
distributions by means of the moments, in line with the Gauss-Thiele concepts, 
becomes practically impossible. Therefore it was of the greatest interest that 
Lindeberg was able to derive an expression for the standard deviation of a 
measure of skewness constructed not on Gaussian but on Laplacian lines, 
namely based exclusively upon the sign of the error. The mathematical diffi- 
culties surmounted by Lindeberg by a very involved and difficult analysis— 
with some clearly indicated gaps in the proofs—are precisely of the character 
of those that Gauss wished to avoid. Encouraged by the success of Lindeberg, 
I have developed in two papers’ the standard deviations of more general mo- 
ments and the correlations between them of which the mean deviation of Laplace 
and Lindeberg’s measure of skewness are special cases. The proofs have been 
arrived at by a rather simple and rigorous procedure. These new moments, 
together with the old ones, form a new system of statistical characteristics by 
which a distribution in one or two variables can be described by expressions 
of lower order and therefore of greater precision. This method makes un- 
necessary the use of moments of higher order than the third. 

But another point of interest is still involved. It has been assumed that the 
Gaussian characteristics give a greater amount of information than those of 
Laplace. This is proved, however, only for the case of the normal distribution 


h 
Vr 
that appeared five years earlier than the Theoria Combinationis Observationum. 
In article 6 of his paper, he says, that the constant h of a normal distribution 
obtained from one hundred observations by the use of the standard error is 
as exact as that obtained from one hundred fourteen observations in which 
the mean deviation is used. Hence with a given number of observations only 
the equivalent of 88% of the total are used by the second method. This does 
not hold true for all distributions. The following theorem can easily be proved: 
The amount of information as defined above, furnished by the use of the mean 
deviation is greater, equal to, or less than that furnished by the standard devi- 
ation, depending respectively upon whether 


—h2z2 


This was recognized by Gauss himself in his paper of April, 1816, 


1 Felix Bernstein: ‘‘Die mittleren Fehlerquadrate und Korrelationen der Potenzmo- 
mente und ihre Anwendung auf Funktionen der Potenzmomente,’’ Metron, Vol. X, N. 3 
(Nov. 1932). 

Felix Bernstein: ‘‘Uber den mittleren Fehler der Potenzmomente.’’ Zeitschr. f. d. ges. 
Vers.-Wissenschaft, Bard 30, Heft 3, March 1930. 










REGRESSION AND CORRELATION 





(8. — 1) = 4(@ — 1) 


where 





























Me 
bo = = 


«- 






Ma 
Bo = 9 


Me 


u;, the k-th moment and 3 = the mean deviation. 





For example, in the distribution seul, the mean deviaition furnishes a greater 


amount of information than the standard deviation.’ 

In the present paper, we shall discuss the practical use of expressions for 
correlation and regression in which the new type of statistics formed along 
Laplacian lines will be used. These new expressions are of a linear form and 
can be computed therefore more easily than those of Karl Pearson. The amount 
of information given by these expressions is less than that given by the expres- 
sions of Pearson if the normal law, in two variables, is fulfilled. For other 
distributions, however, this is not generally true. The determination of the 
standard deviations of these new expressions is given in Metron.* 

The application of the new expressions of regression and correlation to grouped 
data is set forth here for the first time. The method is strongly recommended S 
for all cases in which the data lose reliability with increasing deviations from 
the mean. Deviations in the new method enter the expressions only in the 
first degree and not in the second as in the case of Pearson’s. It is obvious 
that the influence of the doubtful extreme readings is, therefore, considerably 
lessened. Since our expressions are linear, no adjustments for grouping (Shep- 
pard’s corrections) are necessary. 

It ought to be mentioned here that linear expressions for the measurement 
of correlation have been set up before. 

K. Pearson (Biometrika) and Egon Pearson (Biometrika) have derived an 
expression called “linear correlation ratio”? which in case of linear regression is 
identical with the correlation coefficient. 

K. Pearson also discusses the linear correlation coefficient 


a 5 (see + g2), 
2\ xsgx ysgy 


2 To this second type of distribution curves also belongs y = ¥(x) where z(z) is the mean 













eee, ‘as h kh . 
of two Gaussian curves with the same origin, i.e. y(r) = (+. ea? 4 — th " 


Vr 
16<k < 34. 
I owe this remark and some other valuable suggestions regarding the subject of this 
paper to Mr. Myron Fuchs. 
2 Op. cit. 






80 FELIX BERNSTEIN 


suggested by Lenz and various other linear expressions, all similar to our expres- 
sion (1). He finds that they are all equal to his quadratic correlation coefficient 
in the case of a Gaussian distribution. 

However, their expressions were not recommended by those authors for the 
determination of correlation between quantitative variables, because— 

1. No easy and practicable methods were given for their evaluation in the 
case of grouped data. 

2. Their standard deviations were not determined. 

We now proceed to define the new formulas and to describe the methods for 
their evaluation. The proofs are furnished in the Appendix to this paper. 


Let r; and re denote the regression coefficients of x on y and y on x respectively, 
and r, as usual, the coefficient of correlation, and by # and g the arithmetic 
means of the z’s and y’s. Let us take Z, 7 as the origin, so that z, y are the 
deviations from the mean. We have 


Sx 

_ +9 
Sy 

+y 


Sy 
+2 

Sx 
+2 


r=VJn Xn 


Sx denotes a partial sum of the z’s, this sum being extended over all the z’s 
+y 
of the observations whose y is positive and the other sums have a corresponding 
meaning. 
It should be noted though that if data occur whose y-deviation is 0 (practically 
never in a grouped table) one-half of the sum of these x‘s should be added to Sz. 
+y 
In the S a similar addition should be made in case observations occur in which « 
+2 
is zero. (See Table IV.) 
The formulas (1) and all following ones will be proved in the appendix to this 
article.* 


4 Using r; and re of (1) the regression lines are y = rex and x = ryy. They are those 
straight lines which fit the data best according to the method of least squares, if the weight 
of the deviations is taken inversely proportional to the absolute value of the variable. 
Taking x for instance as the independent variable, r2 is the value of m which minimizes 


] 
S— (y — mz)? (the sum extended over all data x y). 
x 





REGRESSION AND CORRELATION 


The standard deviations of 7; and re are 


Sx 
+2 

Sx 
+y 


Sy 
+y 
Sy 
+2x 

We are now going to illustrate the computation of r and for this purpose 
we shall use a table of Pearson’s which gives the correlation between the heights 
of fathers and daughters. 

The totals at the right and lower end of the table are first computed and 
the bracketed numbers are the sums of the numbers that precede. The 
means are 


2, 
2 fh _ 9 Site. 
o;, = on & + m(m — 2r)) where m 


2 


er 
Or, = ay + n(n — 2r)) where 7 


_ 1659.5 —1179 __, 480.5 
~-:1376—~—t«=C*«‘“CS:«*C: TW 


| _ 1650.9 — 1300 _ 260.5 
oo 1376  ~ 1376 
whose signs determine on which side of the working mean to ‘quarter’ the 
table. This quartering is done in Table 1 by the lines w and hh. Then the 
totals above the heavy horizontal separating line hh and those to the left of 
the vertical separating line vv are found, e.g. 2, 4.5, 7.25, --- and .5, .5, 0, ---. 
Multiplying these totals by the respective class marks, we find the outside lines: 
18, 36, 50.75, --- and 5.5, 5,0, ---. 
Sz is now = 1107.5 — 420.5 = 687, and an adjustment for the fact that a 
—y 
working mean has been used has yet to be made. This adjustment is #N_, 
where N_, is the number of negative y’s. (N_, = 728.) 
We have therefore for the adjusted values 
Sx 260.5 


as 1107.5 — 420.5 + 1376 -728 = 825.07 


Sy 480.5 _ 
aa 1179 + 1376 728 = 1433.21 


= .5757 re = .5170 
r = .546 
The standard deviations, according to the formulas (2) are 


or, = 0381 or, = .027 





youy [T YIPIA ssvlO 
ge9= 4 Gig9=X uvayy ZuryI0y 
A 


(¢°0S9T) 9€ | o°SF ‘ SEE) S661 (068T) SLI} OT& | $°2L2 902 | ad 


6 | 1 | | | 
ze F . . . ! 1 . . 
9°99 o°6 
un | 
Ost 9¢ 
ore 
$" Leb 
61 
o" 861 


om 


cI 
gg 
CL LI 
“Ze 
62'S $°80t 
63°18 | 62°18 
ren eer | 
(6211) \G3" : ; ; ; , ; | $3" (¢2°296) 
coer | 1° gz" ' | ¢° | $3" GL°ISt | SZ TST 
£82 ? : ‘ : , : IGZ° ‘ ; C2901 | ¢°3Iz 
162 ‘ . . ‘9 | | 3 3 i 62°08 | $2°0% 
rer "len | ¢° : | ¢° CL’ 0F 91 
CLL : ' > Oe? 4 r GL’4T | GL°82 
88 . . | . . . . . . | . CsI 1 
o'18 , lo | ; z ; | oF 
| I 


$2'S |g2°¢ |S3" 

SOI | &E |$2° 

| ST | $°0% |¢2° 
ic 


ote Hao aae 
rN OWI ON 








°° 
¢° 
oul] 

JO 493°T 

8[890.L 








out] 
eA0gB 


sjvq0,], > 


| | 
GSE S38 \S3°98 ; $2°90T | ¢° ISL°Sh |St' TP 


iy | | 





G°SIT| SOIT e2'98 | (g*Z0IT)| g*eer | s'aIZ 




















S81 |$2'902 | si 


A 
seyouy uy 
si0yyIneq jo yysIepy fT A SIOYIB YT JO PYSOP — x 


siaqybnog pup ssayjny fo syybiayy uaamjaq Uu0ijv12410) 
T WIdViL 

























REGRESSION AND CORRELATION 83 


The standard deviation of r° = r, X 12 has to be estimated by using the 
general formula for the standard deviation of the product ¢ of two variables 
a and b; 


o: e 
c a 
— = — + 
C2 a 


o | 2Roasy 

b* ab 

R being the correlation coefficient between a and b. Since —1 < R < +1, 
substitution of these limits for R leads to the inequalities 


oC oN’ o oC o\Y’ 
( - *) << (2+) 


. 2 
putting a = 1,b = ™,c = 1 we have 


Or Ors O,r2 Oy Or. 
Ty T2 r T1 T2 
Tro 


Considering the relation o, = Or 
r 


we have 2r(o;,%2 — or.71) < or < 27 (67,72 + o7.71) 
from which we derive with sufficient approximation 


o, < -030 


A slightly different arrangement for computing r has been made in the 
following table. 


TABLE II 


Correlation between diameter of the stem and length of the lonest flower petal of 
Trientalis europaea* 














PS 3 15 34 45 | 30 6 2 000 0 
PS —4-3 -2-1 | 0 1 2 8 4 & 6 Total 
1 -—4 1 | 1 
7 -3 1 4 1 1 | 7 
29 -2 1 9 146 3 | 1 30 
33 —-l 2 9 22 | a az 45 
27 0 8 19 | 20 4 #1 52 
8 11 7 | 18 12 6 4 48 
; ¢ ti = 2 2s 23 24 
3 36 4(i1 14 
4 221 3 7 
5 | 1 3 4 
6 1 1 2 





“Total 4 15 34 53 | 56 30 19 12 5 5 1 234 


* FE. Czuber: Die statistischen Forschungsmethoden, Wien, 1921. 


“s,m 8 


RQUIistseor~ttw 6 . 


mase 
& 


wv 


stan 


eI 












FELIX BERNSTEIN 


TABLE III 





x = Diameter of the stem. 
y = Length of the longest flower petal in millimeters. 
Working mean, rz», = .825, ym = 34.5. 

Class width of x = .4 mm. of y = 6mm. 



































Total P.S. Total PS. 
x times x times x y times y times y 
—4 16 12 —4 4 4 
—3 45 45 —3 21 21 
—2 68 68 —2 60 58 
—1 53 45 —1 45 33 
0 (182) (170) 0 (130) (116) 
1 30 6 1 48 8 
2 38 4 2 48 2 
3 36 0 3 42 
4 20 0 4 28 
5 25 0 5 20 
6 6 0 6 12 
(155) (10) (198 ) (10) 
Mean —27 +68 





















The P.S. columns are the partial sums as explained in the previous table. 
The work of multiplying the totals by the class marks and of adding them has 
been separated here from the table. 

We obtain N = 234, N_, = 106, N_, = 135 





7 27 is 
68 


= 805 
130 + 533 


X 135 






116 — 10 + &8 x 106 
234 
a= ——_—_p———. = 84 
182 — 534 X 106 


r= 82 







Pearson’s coefficient for this table is r = .83. 
Finally we illustrate by a small non-grouped table where the partial sums 
can be written down immediately. 


REGRESSION AND CORRELATION 


TABLE IV 
Correlation between Ages of Husband and Wife 





Age of Age of Deviation Deviation 
Husband Wife Husband Wife 


18 —§ —§ 
20 —6 —6 
20 —4 —6 
24 —4 —2 
22 —3 —4 
24 —3 —2 
27 —2 +1 
24 —2 —2 
21 —1 ~§ 
25 
29 
32 
27 
27 
30 
27 
30 
31 
30 
32 


Ave 30 | 3 26 


ees 8 fee 





le * > 5 
Here 0-deviations occur in the third column. Hence 


Sy = 26+ 3X 8 = 30, Sz = 33, Sz = 31, Sy = 36, 
+4 +x +y +y 


mr, = 86, m= 91, 1 = .88 (Pearson’s r = .86) 


SUCALstsee er 8h WY G€ Ee Fhe EE 


Appendix 
Proof of formula (1), page 1. The following notations will be used: 
(f(x))° = probable value of f(z) 
(f(y))2 = probable value of f(y) for a fixed z. 
+1 


sgx = sign of z = ~ forz <0. sgx = Oifzx 


E Ss 


5 See page 7. 








8é FELIX BERNSTEIN 






The assumption of linear regression means that 


(4) ys — y° = Ty2(t — 2’) 

















We multiply both sides of (4) by some arbitrary function ¢(x) of x and get 
(ys = y’)(x) = Ty:2(% — x’)(z). 


Both sides are functions of «. We shall take their probable values for all 2’s. 
Now, for a fixed x, y3¢(x) = (y#(x))2 and the probable value of (y¢(zx))? for 
all x’s is equal to the total probable value (y#(zx))°. So we have 


(yo(x))” — (y'o(z))” = ry:2((a — 2°)p(2))" 
5) ((y — y’)(zx))” 


== (e— Be@)) 


‘ 00 ie 
If now we take xy as the origin, we get 


ly: = (yp(x))° 

yiz (aqg(x))° 
and similarly 

_ _ (aerly))” 

“Y (ydr(y))? 


where ¢; is another arbitrary function. 
Replacing the probable values by the respective arithmetic means we get 


(6) yz = Syo(z) and Try = Sagi(y) 
Sxo(z) Sy¢i(y) 
with %, 7 as the origin. 

By a suitable choice of the still arbitrary functions ¢ and ¢; , we may derive 
all the various expressions for regression coefficients. Taking, for instance, 
$o(x) = x, di(y) = y, we get Pearson’s expressions. Taking ¢(x) = sg(x — a), 
goi(y) = sg(y — ae), a, and ae being constants, we have 








(7) a Sy sg(x — a) _ _ Sxsgly — az) 


rs ae Ziy Y ae en 


~ Sa sg(x — a)’ ~ Sy sg(y — a) 






and if we make aq; = az = 0 















(8) Sy sg x 


i el _ Srsgy 
“e Sx sg x’ 


«Sy sgy 
Since Sx = Sy = 0, we can add Sy or Sz to the numerators and denominators. 


Adding Sy to the numerator, Sz to the denominator and multiplying both 
sides of the fraction by 3 we get 





zy 


() roe = BSv(c0(2 — x) +1) 


3Sa(sg(x — a) + 1) 







REGRESSION AND CORRELATION 


Instead of (9) we can write 


S y+ WB y 
: _&>a zx a) 
(10) S «+ 38 2 


Ty:z = 
xr> ay t= ay 


since the operations of (9) multiply the y ordinates by 0, 4, 1 according as the 
x’s are = ay. 
The expression (10), with a suitable choice of a; should be used for the purpose 


of numerical calculation of r. For instance, when calculating r from the data 
of Table IV, we took a; = a2 = 0 and had 


Sy+2 8S y 
i ie 
se Sx 
+2 
When dealing with data which are arranged in a grouped table (Tables I 
and II) we take a; equal to the x-ordinate of that classline which is nearest to 


the mean. (in Table I a, = .5 — ome) With that choice of a; the sums 


S disappear and the sums S are equivalent to the corresponding sums 
t= ay x>a, 
Hence we have 
+2 
Sy 

+2 i 

(11) Ts = —o and similarly ley = 
+2 


Instead of (9) we can also write 


(9a) — 2Sy(sg(x — a) — 1) 


This leads to 


(11a) 


6 It is desirable to chose the absolute values of the @’s small so that the maximum number 
of data enter into the calculation of r. However, to take a; = a2 = 0 would necessitate a 
division of the middle arrays of a grouped table, a laborious process. Hence the choice 
of the a’s as described above. 


esar. 


e 
~“s\ 


S etrrstseer tee wv 8 


a 


er 


Se= S85R8e BeSae Feeas & 


3 














88 FELIX BERNSTEIN 





Proof of the standard deviations of Formula (2). 
In my article on standard deviations and correlations of moments’ the stand- 
ard deviations of the expressions used in this article have been derived. 


In the following, the notation of the Metron article just referred to will be 
used. We use the symbols: 


Pr. at > 2"-y" 

aa — Z x” sgxry” 
Pniin = Dx” y” sgy 
Pim,jn = a a” sgry” sgy 


The summations indicated extend over all observations. The true or prob- 
able values of the same expressions are indicated by using p instead of P. 


, Pro 
. = —— 
Pon 


We derive the standard deviations by defining the deviations as first variations. 


Try = 


log i log Pio _ log Pon 
én _ 6Payo _ Pon 


ri Piso Posi 
° Par 6P 6P 2770 
(12) or, = [(6r,)°!° = (r;)° ( on ca) | 
P1/0 Posi 


The probable values of the terms on the right hand side of the last equation are 
derived on pages 17-19 and listed on pages 32-33 of the Metron article referred 
to. The proofs which imply essentially a process of variation of Stieltje’s 
integrals will not be given here. From pages 32-33 we take 


3 2 
> [GPP = PP, [Pony = 
[(PyodPonl = PA — ue Pon 
so that 
as P20 Po2 2p 
14 Lad = - 
_ - wry Piso + Pon P1/0 Posi 


Assuming Gaussian distribution, we can put 


T 9 T 2 ee 7 
Po = 5 Pir Poe = 5 Pon Pu = TV Popo = 7 9g Pro Pon 








7 Felix Bernstein: ‘‘Die mittleren Fehlerquadrate und Korrelationen der Potenzmo- 
mente und ihre An: endung auf Funktionen der Potenzmomente,’’ Metron, Vol. X, N. 3 
Nov. 1932). 











REGRESSION AND CORRELATION 


Hence 


lays 2 Dir Pro 
(15) oh = aR orii(a + Be — ay Bs 
i Piso Pio 
Replacing the theoretical values by their corresponding empirical values, 
we have 
1 Sx sg x 


(16) _ (1 + m — 2rm) where m = 


“a Sx sg y 


The formula for 07, has been derived here for the value of r, as given by (8) 
ey In fact, we used 7) = Sz ag (y ~ a) in the examples in the 
Sy sg y Sy sg (y —- a) 
article, and a had some value absolutely smaller than .5. To use equation (16) 
for the standard deviation of 7; is within the limits of the required degree of 
accuracy; hence we shall disregard the difference. In a later paper the standard 
deviation of 7; for any a will be derived by using the method described in the 
Metron article, for a different purpose. 

To prove the statement in the footnote to page 7 

To find the value of rz that makes 


i.e. 1 = 


Sf(x) (y — rex)? a minimum. 
By differentiating we get 
Sf(z)(y — nx)x = 0 
 - Saflady 


> Saf(x)x 
1 we get Pearson’s coefficient. 
1 


iz] (x ~ 0) we get 


New York UNIVERsITY, 
Departments of Anatomy of the Graduate School and the College of Dentistry. 


as ee oe 
LAr 


S StL B tte or we w S Ge 8858S SBEe 28888 & Bae 













METHODS OF OBTAINING PROBABILITY DISTRIBUTIONS’ 


By Burton H. Camp 


The emphasis of this paper will be on method. Special results will be cited 
in order to illustrate the methods rather than to summarize achievement in the 
field; for that has been done already by Rider (1930, 1935) Irwin (1935) and 
Shewhart (1933) in recent surveys. The purpose is to describe and to illustrate 
most of the methods that have been used to determine exact probability dis- 
tributions, and to show that they are all derivable from one fundamental theorem. 
In order to prove this unity in a simple manner, it will-be desirable to omit from 
consideration methods which are essentially ingenious forms of counting, such 
as are used in sampling without replacements from finite universes, and in 
finding the sampling distribution of a percentile. 

The general problem to be discussed may be stated as follows: N individuals 
(t:, --- , tw) are drawn, one at a time with replacements, from a universe whose 
probability distribution is ¢(¢). A certain single valued function of the ?’s is 
formed. This is called a parameter of the sample, and is frequently also, 
but not necessarily, a useful estimate of the corresponding parameter of the 
universe. The problem is to find its probability distribution, f(z). As usual, 
a probability distribution is a function which is required to be defined, except 
perhaps at a set of measure zero, throughout the infinite domain of its variables; 
it is nowhere negative, and its integral over its domain is unity. 

Most of the more recent developments of the theory relate to a more general 
form of this problem. Instead of N individuals, there are N sets of n individuals 
in each set, and these sets are drawn respectively from M(M < N) universes, 
each of which is described by a function of n independent variables, thus: 


(1) o (th, oo+ , €,); = 1, oo, M). 

Instead of a single parameter there are P parameters, and each is a single valued 
function of the observed values of the nN individuals in the sample, thus: 

(2) — gi(t{”, ce eG"; rg qe”, ae &”: (a _ i, or P) 


The first method to be described is fundamental and will be designated as 

THEOREM I. Let it be required that each g as described in (2) be not only 
single valued but also constant at most in a set of measure zero in the nN-way 
space of the t’s. Then 


(I) [se -++ ap) dX = [ ow, +. 4%) aT 
. qa 


1 Presented to the American Mathematical Society at a meeting devoted to expository 
papers on the theory of statistics, April 11, 1936. 


90 





METHODS OF OBTAINING PROBABILITY DISTRIBUTIONS 91 


where X is the space of x’s and T the space of the ¢’s, p is any measurable set 
of points in X, and q is the set in T for which g isin p. Often p is the P dimen- 
sional cube (x; + Az,7 = 1, --- P) at the point (x, --- , x») and then q is 
the set where 


(3) % S94: S % + Az; 


and ¢ is the simultaneous distribution of the sets of ?@’s, 


(4) "eS" Sia &) — era” + orn" 

In this ¢ is the universe from which the t‘” set of t’s is drawn. Obviously, 
if N > M, some of the ¢°”’s are identical, and then it is assumed that the several 
sets are drawn independently. Often, all of the N sets of ¢’s are drawn from 


the same universe. Then M = 1 and all these ¢’s are identical, and (4) becomes 
g = [oe (tt”, ae —“ Cor [eo (¢h””, a a. 


In the special case where there is but one parameter (P = 1) and but one 
individual in the sample (n = N = 1), and pis an interval, formula (I) becomes 


(Ia) I oa (x) dx = / pdt; 


and in the very special case where it is also true that g is an interval it becomes 
at 

| dx |’ 

provided also that certain derivatives (to be specified later in the proof) exist, 
where ¢ is now the inverse solution of the equation, 


(5) x = g(t). 


The proof of formula (I) is immediate, if one is willing to assume the existence 
of the probability distribution f; for then the left side is by definition the prob- 
ability that the z’s lie in p, and this is also the meaning of the right side of (I). 
(Ia) can be proved without assuming initially the existence of f(x), for then 
the existence of f(x) can be inferred from the existence of the right side of (Ia), 
because f(x) may be set equal (except perhaps at a set of measure zero) to the 
upper right hand derivative, with respect to Az (Az is a variable, and z is fixed), 


(Ib) f(z) = o@ - 


of | ¢dt, provided that one adds the condition that this derivative is nowhere 


qd 

infinite. The point at issue here is merely the existence of a primative for a 
monotone increasing function of Az. (Ib) may be derived from (Ia) by taking 
the derivative of both sides with respect to Az, if the derivatives are continuous. 

Theorem I, in these various forms is used a great deal, especially in the last 
form (Ib). This affords one freedom to choose the most desirable function 
for purposes of tabulation. R. A. Fischer’s z distribution, a logarithm, is an 
important illustration. Many authors have been interested in so choosing the 


Ler 


t 
é 
) 
) 
; 
J 
: 
, 
/ 
t 
be 
. 














92 BURTON H. CAMP 


function that its distribution shall be normal. They include several of the 
older writers, and more recently H. L. Rietz (1921, 1927), and G. A. Baker 
(1932, 1934). However, the theorem is of special importance in the theory, 
for all the other principal methods of obtaining probability distributions are 
essentially corollaries of it. These corollaries will be called Theorems II, III, 
and IV. 

THEOREM II. Let j (the measure of p) and g (the measure of q) be infini- 
tesimals of the same order and let both the oscillation of f(i.e. maximum f- 
minimum f) in p and the oscillation of ¢ in q be infinitesimals; then (I) may be 
written, 

(II) Sp = 94, 
where f applies to any point of p and ¢ to the corresponding point of g. This 


equation (II) is an approximate equation in the sense that differences of higher 
order than those retained are neglected. In particular, with the conditions 
used in formula (Ia), equation II becomes 


fax = $4. 


The left side of (II) is an approximation to the probability sought. The right 
side shows that, in order to evaluate it, one need only find the volume in 7’ space 
of the differential element g and multiply it by the value of ging. Formula (II) 
expresses the so-called geometrical method used by many authors, e.g., by 
R. A. Fisher (1915, 1925), by Wishart (1928), and by Hotelling (1925, 1927). 
The chief difficulty in connection with it is in finding the volume of nN-dimen- 
sional g. In order to display the advantages and disadvantages of this method 
we shall pause at this point and look at a concrete example.” 

Let two individual (¢,, f2) be drawn independently from a normal universe 


and consider the simultaneous distribution f(z, y) of the sum, x = & + fb, 
and product, y = tt:, the mean of the universe being chosen as the origin. 
Here N = 2,n = 1, M = 1, and so, 
1 2 2 1 
| , 1 - s3 (¢, + *,) 1 — 592 (2? — 22y) 
6 ? =5-3¢ oo 
210” 210" 


The point set q is the area lying between the two adjacent hyperbolae, 
tite = Y¥, hte = y + Ay, 

and also between the two adjacent lines, 

i +h = 7, 4 +h = 2+ Az, 


where Ax and Ay are infinitesimals and are equal. This area may be computed 
by simple integration and is: 


2 See also C. C. Craig (1936). Craig uses another method to be explained later (formula 
IIIa). 








METHODS OF OBTAINING PROBABILITY DISTRIBUTIONS 93 


_2az Ay 





Vx? — 4y 

= 0 if a < 4y. 
Hence II gives us immediately the desired result: 
z2—2y 
1 38 1 — 
f(z, y) Ardy = —. ————— -ArhAy, if x > 4y, 
To V2? — 4y 
=0 if 2 < 4y. 


If x” = 4y, 7 is an infinitesimal of lower order than pj = (Az)’, and so Theorem II 
does not apply. In this case we must go back to Theorem I, and from that we 
can learn that the probability, 
i f dx dy, 
Pp 


is an infinitesimal of the first order if p = Ar Ay = (Az)’ is of the second order. 
Hence it cannot be approximately represented by a finite number times 7. 
The oscillation of f in p is infinite. The form of the surface f(z, y) is interesting. 
The ordinates rise to infinity on the contour of the parabola x” = 4y, and vanish 
within it. The surface is symmetrical with respect to the plane x = 0, but 
not with respect to the plane y = 0. However, it is clear that the total prob- 
ability of any given product, y (7.e. the probability of this y for all possible 
values of x), is the same as the total probability of —y; hence 


[ ' fuiee = [ "jin ae, 


and the corresponding formulae, 


u 2% 
o2 i 
2 é / é Oe a dx (y a» 0), 
To" Jae /x2? — 4y 
and 
y 2? 
er 
— é é ee dx < 0 ’ 
To" I Va? — 4y ° , 


must be equal; both may be reduced to the single form 


1 y2 

% ~ 5 (**+%) 

F(y) = | é a 
0 


To 


if y#0. 


This is the probability distribution of y. 
With this example before us, let us now reconsider the theory: 
(z) The requirement (in II) that the oscillation of ¢ be infinitesimal in q 





Twatwsi reaarey 


~sa wv 
# 


Gita! 


eeu rx 











94 BURTON H. CAMP 






will be satisfied if one can show that ¢ may be expressed as a continuous function 
of the parameters (x21, x2, -::, Xp). In our example these parameters were 
x and y and ¢ was so expressible (6). But if we had tried initially to find by 
means of (II) the distribution of the product y, independently of what values 
x might have, we should have been stopped at this point, because ¢ is not 
expressible in terms of y alone. We should also have been stopped by the 
requirement that @ be infinitesimal of order Ay, for q would have been the 
space between two hyperbolas and its area for any fixed (Ay > 0) would have 
been infinite. But, when thus stopped at that first point, it would have been 
clearly indicated to us that the distribution of y might have been found via 
the detour of finding the simultaneous distribution of both x and y, because 
an attempt to express ¢ in terms of y would have led to the given expression in 
terms of both x and y. For a similar reason R. A. Fisher (1925) was able to 
find the distribution of the variance by finding first the simultaneous distribution 
of the variance and the mean. Also, he was thus able to find the distribution 
of the coefficient of correlation by finding first the simultaneous distribution of 
all the first and second order moments. 

(27) A distinct advantage of this method is that q is independent of the 
universe ¢, so that once found it may be used in connection with any universe 
which satisfies the condition that it can be expressed as a continuous function 
of the parameters. Thus, the distribution of the sum and product in our 
example may equally well be found for the universe described by the Type III 
curve, Ate “(tf > 0). For, then 


A2 —a(t,+t2) 2 —ar 
¢=A' the =A*ye, 


and so, using one-half of the same @ as before, since now z, y = 0, 








2 —ar 2 : 2 
AC ye ————— if x > 4y, 
V2 — 4y 
= 0 if a2 < 4y. 
From this, F(y) can be found by integration (c.f. Kullbach, 1934) 


~ F 
—a ( ut — 
u 


‘ 7 _ A’y [*e 
Jy Vx — 4y 2 Jo u 


As another illustration, consider a normal universe of n intercorrelated vari- 
ables in which all the total intercorrelations are equal to r (e.g., the statures of 
n brothers) and let the sample be a single group of n (one individual for each 
variable). 


f(x, y) 


1 _ : le _ 
— ag [HEE tH 2,066] 
= - — € 
(27)? R 2 


where R = (1 — r)""[1 — (n — 1)r], A = (1 — r)"[1 — (nm — 2)r], and 
ko = —r(1 — r)"”*. Suppose one wishes to find the simultaneous distribution 


g 































METHODS OF OBTAINING PROBABILITY DISTRIBUTIONS 95 


of the variance xz and the mean y for such samples.’ Since for Student’s problem 
Fisher has found the value of q for this z and y to be 





n—3 


q = cz - Ardy, 

their distribution f(z, y) for this universe may be written down immediately. 
In terms of x and y the bracket in the exponent of ¢ is y*(kin — kon + kon’) 
+ an(ki — ke), and so f(z, y) is the product of g and this form of ¢: 


n—3 


f(z,y) =Ke*x 2 E=-— oP [((kin — ken + ken”)y? — n(ky — ke)a). 


(22) Another attribute of this method is that it sometimes lends itself to easy 
extensions from a simple case where there is only one restriction (NV — 1 degrees 
of freedom) to similar cases when there are more restrictions. Thus R. A. 
Fisher (1924) proceeded from the variance of a sample from a single universe 
to the variance from a set of universes, as required in the theory of analysis of } 
variance; and thus also (1915) he had proceeded from the distribution of r to 
that of multiple R; and Hotelling (1927) showed how these distributions could 
be obtained when the values of each variate were themselves intercorrelated j 
(as in a time series) and not merely correlated with values of the other variates. 

THEOREM III. Now let us consider again the fundamental form (I). For 
convenience let nN = m. If the conditions will not permit us to write the right : 
side in the form in (II), it is still possible that we may be able to find that i 
(m + 1)-dimensional volume by some other method. In particular, whenever 
it is possible to iterate the integral once we have the formula: 


4 
(III) [ sax = I ar’ [ ¢ dtn, | 
P ¥* am : 


where q» is the section of q by tn space at the point (4, --- , tm-1) of T’ space, ’ 
T’ space being the space of the (4, --- , tm) coordinates. With added condi- 
tions one may deduce from (III), for the case where there is but a single para- 
meter x, the approximate equation: 


(IIIa) fdx = dz I dT’ - (ti, «++ , tm) dim 

Tr’ dx 
in which f,, is supposed to have been expressed in terms cf the other coordinates 
by solving the equation x = g(t, ---,¢»). It is an approximate equation in 
the same sense as (II) was. Sufficient conditions for this change in the left 
side of (III) have already been mentioned in discussing (II). The propriety 
of making the corresponding change in the right hand side may be left for 
determination when the form of ¢ is given. It will perhaps be sufficient her- 
to point out that our earlier example illustrates both the case where this change 


3 A special case of a more general problem solved first by R. A. Fisher. 


96 BURTON H. CAMP 


is permissible and where it isnot. For, let it be required to find the distribution 
f(y) of the product y = tht without reference to the sum, 4; + &. Formula 
(III) yields 


2 2 


yt+Ay 20 (yt+Ay)/ ty 1 om (+ + 7) 
(7) / f(y) dy = 2 | at, | dtp - ——e ; 
y 0 yl ty 270” 


This is valid for every value of y including y = 0. If y ¥ 0, we may change 
the right hand side as in (IIIa) and obtain as the probability that y is in the 
interval (y, y + Ay): 


ytAy ~- (+73) 
(8) [ser ay = Ae [he dite, 
y woe Jo ty 


where e¢ is a differential of higher order than Ay. This may be proved by com- 
puting the difference between the value of (7) when & has constantly the value 
(y + Ay)/t, and when it has constantly the value y/t;. If y = 0 this change 
in the right side of (7) is not valid; it is easily seen that in this case the integral 
on the right of (8) is infinite. It may be shown, however, in this case that 


(9) I fy)dy = 3- -i = as 


and that this is an infinitesimal, and that it is of order as small as one. 

Many authors think of (IIIa) as the fundamental formula in the theory of 
probability distributions. One of the simplest and earliest applications of it 
was to establish the so-called reproductive property of the normal law: that 
the sum of two variates is distributed normally if each is distributed normally. 
Jackson (1935) has used it to establish a similar property for two Type III 
distributions which have the same exponent of e. Usually this integral is 
difficult to evaluate when N > 2 because of the unsymmetrical form into 
which it is cast, but when N = 2 and there is but one parameter (IIIa) it is 
perhaps the most convenient of all the formulae. 

THeorEM IV. An exceedingly useful formula is obtainable from (I) in the 
following manner. Let 6(%, ---, tp; a1, **: , ag) be a finite single valued 
function of the old parameters {r) and of some new parameters (a). Subject 
to general conditions to be stated we may write: 


(IV) [ of dX = I 0’d aT, 
x _ 


an identity with respect to each a, where 6’ is the result of substituting (2) 
for the z’s in @. 

Since this theorem has not been proved in this general form, an outline of 
the proof will be given. Sufficient conditions are: 

(a) All the integrals involved shall exist. 





METHODS OF OBTAINING PROBABILITY DISTRIBUTIONS 97 


(b) If p is limited (in the sense that it lies within a finite hypersphere), so 
is g, and conversely. 

Proof. Let Xo be a limited p set and 7 the corresponding q set such that 
both (c) and (d) hold (e > 0): 


(c) [ fodx — [ fodX| <«, 
| JXo x 


(d) I ¢0' dT — [ ewer <e. 
| Jro T 


It is easy to see that such an Xo and a corresponding 7 do exist, as follows: 

Let Xo be a limited set for which (c) is true, and for which it will remain 
true no matter what points are added to Xo. Similarly, let 7 be a limited 
set for which (d) is true and for which it will remain true, no matter what 
points are added to Tj. Presumably X and 7 do not correspond to each 
other, but we may now let Xo be the totality of all the points of Xo and of all 
those points of X corresponding to 7%, and let 7> be the totality of all the 
points of T) and of all those points of T corresponding to Xo. Then Xo and 
T) do correspond to each other and have the desired properties (c) and (d). 
Now, since @ is finite, it is limited in Xo. Let 

(e) |0|<HinX. 
Divide the interval (—H, H) into s equal subintervals of length h, thus defining 
in Xo according to Lebesgue the measurable sets, 

pi (@@ = 1, --- , s), and corresponding q; sets in T: 


Os 06 Shinpi, 
\Os # Shing. 


(f) 


Choose arbitrarily any point of p; and let k; be the corresponding value of 8@. 
Then let 


= k; in p; (¢ = 1, --- , s), and similarly let 
= k;ing: (¢ = 1, ---, 8). 


| 6fdX = Dk f f dX, 
Xo i Pi 
/ ¢dT = sf @ dt. 
To t Gi 
[ sax = I ¢ aT, 
, ; t 


| asax = | 'o aT. 
Xo To 


Since by (I) 














BURTON H. CAMP 


ff @-osax|s | \b-alsaxsh| sax, 
Xo Xo Xo 


i (’ —6’)dX\ a @ dX. 
To | To 


So, as h approaches zero both sides of (g) approach limits and their limits are 


equal: 
[ 6fdX = [ 6’ aT. 
Xo To 


Hence by (c) and (d) the integrals 


[ orae, I 6’ ¢ aT, 
X T 


differ at most by 2¢«, and so, being independent of ¢ they do not differ at all. 
In order to determine the form of f from (IV) one must first evaluate the 
right side, 


| a 
T 


and then solve the integral equation, 


(10) I of dX = y. 


It is the solution of this equation that usually presents the most difficulty. 
Particular forms of 6 that are being used are 


(11) g = ett tape 


’ 


in which case y is said to be the ‘“‘characteristic function” or “moment generating 
function”; and 






€ 


(12) O = a" --- ap’, 







in which case y is a “moment function” or “moment” of f. Other forms might 
be used. For example, a very convenient method of demonstrating the correct- 
ness of the usual formula for the simultaneous distribution of the correlation 
(x), means (y, 2), and variances (u, v), in samples from a normal bivariate 
universe is by the use of 





a = ertu? +v2 + y?2 + 22) + ag (uve + yz) 









This method of finding f is not a final determination of the probability function 
desired until it has been shown that the solution is unique, a serious problem 











METHODS OF OBTAINING PROBABILITY DISTRIBUTIONS 99 


in itself; it is one of those which Professor Shohat may consider.‘ There are 
three methods of solving the integral equation (10): 

(i) The first might be called guessing. Though unscientific, it is in fact 
often effective. Especially is it available if the distribution has already been 
surmised but not demonstrated. Thus, it was open to Student (1908) when 
he correctly surmised the distribution of the variance. Similarly it was open 
to Soper (1913) when he incorrectly surmised the distribution of r. 

(iz) Papers by Romanovsky (1925) and Wilks (1932) have shown how the 
problem of solving the integral equation may be shifted to the problem of 
solving a partial differential equation, but this in turn may involve the solution 
of another equally difficult integral equation in the process or determining the 
arbitrary function. 

(iit) If each a be replaced by an imaginary #7 and one uses a Fourier trans- 
form, one arrives at a set of formulae which are most important. For the case 
where there is but one x and one 6, they may be written: 


(13) e* f(x) dx = | e” gdT = (8). 


(1s f(a) = 2 fe wa) ap. 


Dodd (1925) has given an equivalent set of formulae involving only real vari- 
ables. It is easy to prove that both sets may be changed to the single formula, 


(15) f(z) = Lf oa [ nae ae 


Kullbach (1936) has established the validity of the formulae corresponding to 
(13) and (14) for the general case of (P + Q) parameters. Wishart and Bartlett 
(1933) used the general forms to find the distribution of the generalized product 
moment in samples from an n-dimensional normal system. 

When the solution of the integral equations of (IV) cannot be found, one 
has to put up with the semi-invariants or with the moments of f. Formulae 
(IV) and (11) yield the semi-invariants, (IV) and (12) the moments about the 
given origin, and from either of these one may obtain the moments about the 
mean point. These methods are old but they are still important. Time does 
not permit me to discuss them, because it would not be proper to close this 
paper without some reference to limit methods. 

Limit Methods. It is well known that the distribution of means of samples 
taken from almost’ any universe approaches the normal law as a limit as N 
becomes infinite. This theorem is subject to great generalizations, as is indi- 
cated in papers of A. Liapounoff (1901), S. Bernstein (1926), Romanovsky 


4 In a later paper at the same symposium. 
’ There are exceptions. E. g., means of samples taken from the universe a/x(a + ¢?) 
have a distribution identical with the universe itself. 


100 BURTON H. CAMP 


(1929, 1930) and C. C. Craig (1932). Subject to very general conditions it 
has been shown that: If the characteristic function of one probability distri- 
bution contains a parameter and approaches as a limit, uniformly in every 
finite domain of its variables, the characteristic function of another probability 
distribution; then the first distribution approaches as a limit the second distri- 
bution. Hence 8. Bernstein and Romanovsky have shown that: If the universe 
is an n-way correlation solid of a certain very general type, then the n means 
* (ti, + ee + tin), 
(¢ = 1, ---, n), have a distribution which approaches as a limit a normal 
correlation solid as N becomes infinite. A similar theorem has been established 
also in the interesting case of Romanovsky’s “belonging coefficients”, which 
include K. Pearson’s coefficient of racial likeness. Also, by the method of 
maximum likelihood, Hotelling (1930) has proved that under certain general 
conditions all optimum estimates of the parameters of a frequency distribution 
have a joint distribution approaching the normal as N becomes infinite. The 
validity of the method of maximum likelihood when used for this purpose has 
been established by J. L. Doob (1934). 

Finally, one may note an apparently new limit theorem of another type. 
Its general nature will be obvious from the following application: 

Let a sample of N be drawn from the universe, 


obtained by a selection of a sample of N sets of variates, 7; = 


ju dr”. © t>e 


= 0 if ¢< 0. 


It is readily proved, by means of (IV), that the distribution f(x) of the para- 
meter, 


a= (fi +---4+t)'* 
is a curve of the form, 
f(z) = Bre where x > 0, 
0 elsewhere. 
Now let \ become infinite. The universe approaches as a limit the rectangle: 
© = A where 0 <t <1, 
= 0 elsewhere. 


The parameter x approaches as a limit X, where X = maximum ¢;. The 
distribution f(x) approaches as a limit the new distribution, 


F(X) = NX*" where 0 < m4 <4, 


0 elsewhere. 





METHODS OF OBTAINING PROBABILITY DISTRIBUTIONS 10] 


Hence we have proved in a new way, what was already known: that the distri- 
bution of the greatest variate obtained by sampling from a rectangular universe 
is of the form F(X). 

The limit theorem implicit in this illustration can be established in sufficient 
generality, but I do not yet know whether it has other applications of value. 


REFERENCES 


. A. Baker, Transformations of bimodal distributions, Annals of Mathematical Statistics, 
vol. 1 (1932), pp. 334-344. 
y. A. Baker, Transformation of non-normal frequency distributions into normal distri- 
butions, Annals of Mathematical Statistics, vol. 5 (1934), pp. 113-123. 
. BERNSTEIN, Su. |’extension du théoréme limite du calcul des probabilités aux sommes 
de quantités dependantes, Mathematische Annalen, vol. 97 (1926), pp. 1-59. 
’. C. Crate, On the composition of dependent elementary errors, Annals of Mathematics, 
vol. 33 (1932), pp. 184-206. 
. C. Craia, On the frequency function of zy, Annals of Mathematical Statistics, vol. 7 
(1936), pp. 1-15. 
{. L. Dopp, The frequency law of a function of variables with given frequency laws, 
Annals of Mathematics, vol. 27 (1925), pp. 12-20. 
J. L. Doos, Probability and statistics, Transactions of the American Mathematical So- 
ciety, vol. 36 (1934), pp. 759-775. 
. A. Fisuer, Frequency distributions of the values of the correlation coefficient in samples 
from an indefinitely large population, Biometrika, vol. 10 (1915), pp. 507-521. 
. A. Fisuer, On a distribution yielding the error function of several well known statistics, 
Proceedings of the International Mathematical Congress, Toronto, 1924, vol. 2, 
pp. 805-813. 
. A. Fisuer, Applications of ‘‘Student’s’”’ distribution, Metron, vol. 5, no. 3, (1925), 
pp. 90-104. 
. Horevuine, The distribution of correlation ratios calculated from random data, Pro- 
ceedings of the National Academy of Sciences, vol. 11, no. 10 (1925), pp. 657-662. 
. Horeirne, An application of analysis situs to statistics. Bulletin of the American 
Mathematical Society, vol. 33 (1927), pp. 467-476. 
. Hore.uine, The consistency and ultimate distribution of optimum statistics, Trans- 
actions of the American Mathematical Society, vol. 32 (1930), pp. 847-859. 
J. O. Irwin, Recent advances in mathematical statistics, Journal of the Royal Statistical 
Society, vol. 98, part 1 (1935), pp. 88-92. 
D. Jackson, Mathematical principles in the theory of small samples, American Mathe- 
matical Monthly, vol. 42 (1935), pp. 344-364. 
$. Kutipacu, An application of characteristic functions to the distribution problem of 
statistics, Annals of Mathematical Statistics, vol. 5 (1934), pp. 263-307. 
S. Kuuisacu, On certain distribution theorems of statistics, Bulletin of the American 
Mathematical Society, vol. 42 (1936), pp. 407-410. 
A. Liapounorr, Nouvelle forme du théoréme sur la limite de probabilité, Memoires de 
l’Académie de St. Pétersbourg (8), vol. 11 (1901). 
. R. Riper, A survey of the theory of small samples, Annals of Mathematics, vol. 31 
(1930), pp. 577-628. 
. R. River, Recent progress in statistical method, Journal of the American Statistical 
Association, vol. 30 (1935), pp. 58-88. 
. L. Rretz, Frequency distributions obtained by certain transformations of normally 
distributed variates, Annals of Mathematics (2) vol. 23 (1921-22), pp. 292-300. 





102 BURTON H. CAMP 


H. L. Rrerz, On certain properties of frequency distributions of the powers and roots of 
the variates of a given distribution, Proceedings of the National Academy of 
Sciences, vol. 13 (1927), pp. 817-820. 

U. Romanovsky, On the moments of the standard deviations and of correlation coefficients 
in samples from a normal population, Metron, Vol. 5(4) (1925), pp. 1-45. 

U. Romanovsky, Sur une extension du théoréme de A. Liapounoff sur la limite de prob- 
abilité, Bulletin de 1’ Academie des Sciences de 1’U. 8. S. R. (1929), pp. 209-225. 

U. RomaNnovsky, On the moments of means of functions of one or more random variables, 
Metron, vol. 8(1-2) (1930), pp. 251-291. 

W. A. SHewHart, Annual Survey of Statistical Technique: Sample Theory, Econometrica, 
vol. 1 (1933), pp. 225-237. 

H. Ek. Sopgr, On the probable error of the correlation coefficient to a second approxima- 
tion, Biometrika, Vol. 9 (1913), pp. 91-115. 

SruDEnt, The probable error of the mean, Biometrika, vol. 6 (1908), pp. 1-25. 

S. S. Wixks, On the distributions of statistics in samples from a normal population of two 
variables with matched sampling of one variable, Metron, vol. 9(3-4) (1932), 
pp. 87-126. 

J. WisHart, The generalized product moment distribution in samples from a normal 
multivariate population, Biometrika, vol. 20A, (1928), pp. 32-52. 

J. WisHART AND M.S. Barttett, The generalized product moment distribution in a normal 
system, Proceedings of the Cambridge Philosophical Society, vol. 29 (1933), 
pp. 260-270. 


WESLEYAN UNIVERSITY. 





MOMENT RECURRENCE RELATIONS FOR BINOMIAL, POISSON 
AND HYPERGEOMETRIC FREQUENCY DISTRIBUTIONS’ 


By Joun RiorDAN 


1. Introduction. This paper gives the development of recurrence relations 
for moments about the origin and mean of binomial, Poisson, and hyper- 
geometric frequency distributions from the basis of the moment arrays defined 
by H. E. Soper.” This procedure has the advantage of expressing the moments 
in terms of coefficients which are alike for the three distributions and are de- 
rivable by a single process, thus providing a degree of formal coordination of 
the distributions. For both kinds of moments, the coefficients satisfy relatively 
simple recurrence relations, the use of which leads to recurrence relations for 
the moments, thus unifying the derivation of these relations for the three 
distributions. The relations derived in this way for the hypergeometric dis- 
tribution are apparently new. Apparently new recurrence relations for certain 
auxiliary coefficients in the expression of the moments about the mean of 
binomial and Poisson distributions are also given. 

This course of development involves repetition of a number of well-known 
results which is justified, it is hoped, by the unification obtained.* 


1 Presented to the American Mathematical Society, Sept. 3, 1936. 

2 Frequency Arrays, Cambridge, 1922. 

’ The following bibliography is taken from a paper On the Bernoulli Distribution, Solo- 
mon Kullback, Bull. Am. Math. Soc., 41, 12, pp. 857-864, (Dec., 1935): 


A. Fisher, The Mathematical Theory of Probabilities, 2d ed., p. 104 ff. 

H. L. Rietz, Mathematical Statistics, 1927, p. 26 ff. 

V. Mises, Wahrscheinlichkeitsrechnung, 1931, pp. 131-133. 

Risser and Traynard, Les Principes de la Statistique Mathématique, 1933, pp. 39-40 and 
320-321. 

V. Romanovsky, Note on the moments of the binomial (q + p)” about its mean, Biometrika, 
vol. 15 (1923), pp. 410-412. 

A. T. Craig, Note on the moments of a Bernoulli distribution, Bull. Am. Math. Soc., vol. 40 
(1934), pp. 262-264. 

A. R. Crathorne, Moments de la binomiale par rapport a L’ origine, Comptes Rendus, vol. 198 
(1934), p. 1202; 

A. A. K. Aygangar, Note on the recurrence formulae for the moments of the point binomial, 
Biometrika, vol. 26 (1934), pp. 262-264. 


To this, besides Soper’s tract already mentioned, should be added: 


Ch. Jordan, Statistique Mathematique, Paris, 1927. 
K. Pearson, On Certain Properties of the Hypergeometric Series . . . , Phil. Mag., 47, pp. 236- 
246 (1899). 


103 








104 JOHN RIORDAN 












2. Moment Arrays. As developed by Soper, frequency distributions may be 
exhibited by frequency arrays, in the case of a single variate, in the form: 


(2.1) f(A) = 2,» A’ 


where p,; are the frequencies with which the measures, x, of the character, A, 
occur in a population. 


The substitution A = e“ leads to the moment about the origin array: 


je") = Zz, Pz e 
tia? 
(2.2) = - + ta + ay + -) 


=2 ms 











where 








m = >, Pad 
z= 


The symbol a is a logical or umbral symbol serving merely to identify the 
moments in the expansion of the array. 


The moment array for moments about the mean is found from the relation: 
(e*) ws err) 
= Diua'/s! 


where m, is the first moment about the origin. 
The moment arrays for the distributions concerned are as follows: 


Binomial f(e*) = [1 + ple* — 1))" = . (") p (e* — 1)’ 


z=0 



















‘ a a(e®—1) < a’ (e* — 1)” 
Poisson fle) =e - 7 Pelee 


x! 


z=0 











Hypergeometric f(e*) = > Wa(r)2 (e* — 1" 


z=0 (n)2 x! 












where the parameters p, n, and a for the binomial and Poisson have the usual 
significance. The parameters for the hypergeometric distribution, with the 


substitution r = s, follow Soper; Pearson (loc. cit.) uses g, r, n, where g = I/n. 
The notation (1), means 


(), = Wl — 1) +: 












(l—2+ 1). 


' ; : ‘ n 
It will be seen that, with the usual interpretation of () as zero for x > n, 










MOMENT RECURRENCE RELATIONS FOR DISTRIBUTIONS 105 


















be the three distributions so far as concerns a may be exhibited by a function 
of the form 
fle’) = 2d Axe* ~ i 
A, where A, of course depends on the distribution concerned. 
3. Moments About the Origin. The moments about the origin can then be 
defined by the equation: 
(3.1) dim. = 2 A.le* — 1)" 
3= . z=0 
and 
2 Aw’ — 17 = Fae 2 (~ 17” (7) " , 
z=0 z=0 v=0 v 
att Pua o 
= do i tet! As, t 
‘ where S;,, is a Stirling number of the second kind, as used by Jordan (loc. cit.) : 





and defined by 





x!S,.= Zz (— 1)*” () v = A’ 0’, 
v=0 


A’0* being in the language of the finite difference calculus, a “difference of 
nothing” that is A’n* | n = 0. 
The internal series terminates at s because S,,, = 0, x > s, as is readily 
apparent in the finite difference expression. Further So,, = 0, s ¥ 0; So,o = 1. 
By equating coefficients in equation (3.1), m;,, the sth moment about the 
origin, is given by 









(3.2) Ms; = = ee 
z=0 





The particular forms for the three distributions are as follows: 










8 











(3.3) m, = >, (n)z i Binomial 
z=0 

(3.4) Mm, = > © Ba. Poisson 
z=0 

(3.5) Ms, = > ae i. Hypergeometric 
z=0 z 





The Stirling numbers have the following recurrence relation (Jordan loc. 
cit.) : 


(3.6) 









Sz, s41 = 2B, + Bexkce- 








106 JOHN RIORDAN 


This relation in conjunction with equations (3.3)—(3.5) leads to moment recur- 
rence relations. The procedure is illustrated for the binomial distribution as 
follows: 


st+l1 


Mors = D2, (no BD” Se, 041 
z=0 
st+1 
— 7 (n)z p (x Be. 8 + B51, s) 
z=0 
= pD,m. + (npm — p’ Dp m.) 
= npm, + pg Dz ms; 


where gq = 1 — p. 
The steps in the process are expanded as follows: 


st+l 8 


>» (n)zp’ x Sz,s = » (n)2p x Sz, s 
z=0 z=0 


= a (n)- Sz, 8 pD,(p*) 
= pD, Ms 


s+l1 st+l 


a (n)zp S21,2 = a (n— 2+ 1) (n)zup’ Sr1,. 


=n > (n)z ~ Sz, eats >> x(n)z ” iat Sz, 8 
z=1 z=1 


= npm; — p D,ms 


The results for the three distributions are as follows: 


(3.7) Mii = npm;, + pqD,m, Binomial 
(3.8) Msi1 = am; + aDam, Poisson 
(3.9) Mei = mill —1,r—1,n—1)—(n+1)A.m, Hypergeometric 


Here D, and D, denote differentiation with respect to p and a, respectively, 
and A, denotes the difference operation with respect to n. For the hyper- 
geometric distribution the moments are functions of 1, r, and n as well as of s; 
m;(l — 1, r — 1, n — 1) is the same function of 1— 1, r—1 and n — 1 as 
m,(l, r, n) is of l, r, n. Equation (3.9) appears to be new. 








ur. 
| as 





MOMENT RECURRENCE RELATIONS FOR DISTRIBUTIONS 107 








For convenience of reference, a short table of the Stirling numbers of the 
second kind follows: 


Nz 


















1 
1 15 25 10 1 


4. Moments About the Mean. As shown in Section 2 above, moments 
about the mean may be defined as follows: 


(4.1) Do bs = = >), A,e™* (e* — 1)° 


s=0 z=0 














where m, is the first moment about the origin: 






m, = np Binomial 





a Poisson 





I 





lr/n Hypergeometric 


Now 





oo ad zx 
Eacw ey Ea Sn (2) Em 
z=0 z=0 v=0 v 
oo a’ 8 
=) LrlAcos. 
s=0 §: z=0 









where 


sles. « 2(~ 17" @ (v — m)* = A* (— m))’. 
v=0 


It will be observed that for m, = 0, oz,, = S:z,;. The internal series terminates 
at s for the same reason as before. 
The moments about the mean are then given by: 






(4.2) Ms = > st A,e,,. 
z=0 





The particular forms for the three distributions are as follows: 





(4.3) ts = >, (n)z Ds Ce, e Binomial 
z=0 
(4.4) Ms = >, Q"oz,s Poisson 
z=0 
(4.5) nm 2 Ws (r)s Cas Hypergeometric. 











z=0 ~ (n)z 











108 JOHN RIORDAN 


rT ° . . . 4 
rhe coefficients o,,, satisfy the following recurrence relation: 
(4.6) Gz,041 = (x a M1) Oz, + Orx-1,8 


which in conjunction with equations (4.3)-(4.5) leads to moment recurrence 
relations as before. The actual derivation is somewhat complicated by the 
circumstance that o,,,is a function of m, and therefore of the frequency param- 
eters, rather than a constant as before. The derivation is illustrated for the 
binomial distribution as follows: 


s+1 


x (n)z p" Oz, s41 
& 


s+1 


= 2 (n). p’ (x — np)oz,s + o2-1,5] 


Ms+1 


st+1 


= >> (n)z62,2 pD,(p") — npus + a (n)zP Or-1,5 
z=0 := 


pD,us + nspus-1 — NPus + NPus — p[Dpws + nsys—r] 


pq [nsus—1 + Dyusl. 


The steps in the process are expanded as follows: 


> (n)z0:,;pD,(p") = > (n)z[pD,(p" oz,s) — p’ pDp(oz,s)] 


z=0 


& 


= pD,.». — p 7 (n)zp (— nsoz, sr) 


z=0 


= pDpus + nspys—1 


s+l s+l 
Zz (n) 2p" oz-1,5 = > (n- 241) (n)r1p Or-t,s 
z=0 z=0 
& & 
la +] +1 
=n, (n).por.s — LD 2U(n)s poe. 
z=0 z=0 
9 
= np pus — P [Dope + 28 be-rl. 
The relation D,c,,, = —nsoz,,-1 is obtained from the definition equation of 


or,, (with m, = np). 


The resulting recurrence relations for the three distributions are as follows: 


(4.7) Ms+t = NSP Msi + PG Doyus Binomial 
(4.8) Msi = ASus_1 + a Dau; Poisson 
4 Jordan, loc. cit. or E. C. Molina, An Expansion for Laplacian Integrals ..., Bell 


System Technical Journal, 11, p. 571. 





MOMENT RECURRENCE RELATIONS FOR DISTRIBUTIONS 109 


Mey = (n + 1) E > » @) Kj we» (Ly ry + | Hypergeometric 
v=0 


. ue < (3) Ki nl am | 


v=0 


—Ir 


ee 9 * 


1 


~ @=-D 8 
The last of these, which appears to be new, seems to be of formal interest only. 
The coefficients oz,, are related to the Stirling numbers by the expression: 


&8—Zz 


Oz,8 = a (—1)” @) h, nc = 2, dom’ 


and consequently can be exhibited with detached coefficients in the form 
a + a, + a + +--+ +: a,-z. For the binomial and Poisson distributions 
certain simplifications, to be developed in the section following, -in equations 
(4.3) and (4.4) may be made. For the hypergeometric distribution it appears 
necessary to use equation (4.5); the following short table of o.,,, employing the 
detached coefficients mentioned above, is given for this purpose: 


3 
0-1 1 
0+0+1 1-2 
0+0+0-1 1—3+3 3-3 1 
0+0+0+0+1 1—4+6-—4 7—12+6 6—4 1 
0+0+0+0+0-—1 1-—54+10—10+5 15—35+30—-10 25-—30+10 10-5 1 


5. Binomial and Poisson Moments About the Mean—Simplified Formulas. 
5.1 Binomial. From examination of the first few moments about the mean, 
it appears expedient” to write the formulas: 


Hes = Zz Az , 2 (npq)* 
z=1 


Mas4+1 = (q si i) 2d Oz, 2s41 (npq)” 


5 The kind of expression chosen admits of some variety. A recurrence relation for 


8 
coefficients in the expansion p; = Zz a,,; p® has been given by E. H. Larguier, On a Method 
z=1 
For Evaluating the Moments of a Bernoulli Distribution, Bull. Am. Math. Soc., 42, 1, p. 24 
(Abstract 8); I am indebted to Mr. Larguier for the opportunity of examining his results 
in advance of publication. 








110 JOHN RIORDAN 


When these are substituted into the moment recurrence relation, the coefficients 
are found to be related as follows: 






Azr2s = [x + PD pqlaz,2s—1 -+- (2s = 1)ar_1 2-2 
—2pq{l + 2a + 2pgqDpqlaz,2s—1 


Az 2841 = [x + pqD pqlaz,es + 2saz-1,2s—2 













or, in general, 
Az sti = [a + pqD pqlez, s + SQz-1, s—1 
— pq{l — (—1)*] [1 + 22 + 2pqD,,]a:,s 


Using detached coefficients of powers of pq as outlined above, these coeffi- 
cients may be exhibited as follows: 


% _— a ae. ; om 
x 1 2 3 4 





(5.1.2) 












2 1 

3 1 

4 1—6 3 

5 1-2 10 

6 1 — 30 + 120 25 — 130 15 

7 1 — 60 + 360 56 — 462 105 

8 1 — 126 + 1680 — 5040 119 — 2156+ 7308 490 — 2380 105 
9 1 — 252 + 5040 — 20160 246 — 6948 + 32112 1918 — 13216 1260 


It may be noted that the coefficients of the first column in conjunction with 
equations (5.1.1) give the binomial seminvariants. 

Equations (5.1.1) make the coefficients functions of pq only; a slight alter- 
ation makes the coefficients functions of n only. Thus: 


Mes = > Bz, 2% (pq)” 
(5.1.3) F 


M231 = (q = 1) dL Bz, 2041 (pq)* 


and the coefficients are found to satisfy the recurrence relation: 
(5.1.4) Bz, 241 — XBz,s + NSBz-1, s—1 a {1 = (—1)*](2z — a 


These coefficients may be exhibited by a rearrangement of the table given 











MOMENT RECURRENCE RELATIONS FOR DISTRIBUTIONS rit 


above as may be seen by comparing equations (5.1.1) and (5.1.3). The first 
few coefficients are as follows: 


n™ Bz,s 


-—-6+3 
— 12+ 10 
— 30 + 25 120 — 130 + 15 


5.2 Poisson. The Poisson moments about the mean may be expressed as 
follows: 


{s/2] 
(5.2.1) Ms = Dy az,.0” 


z=0 


where [ ] represents “integral part of” and 


(5.2.2) Qz,s41 — LQz,s t- SQz-1, s—1- 


The coefficients a;,, are the constant terms in the expressions for the corre- 
sponding binomial distribution coefficients in powers of pq. 


BELL TELEPHONE LABORATORIES. 





















NOTE ON ZOCH’S PAPER ON THE POSTULATE OF THE 
ARITHMETIC MEAN 


By ALBERT WERTHEIMER 


1. Introduction. There appeared recently a paper by Richmond T. Zoch’ 
entitled ‘On The Postulate of the Arithmetic Mean.”’ The stated purpose of 
his paper, was to show that the derivation of the Postulate as given by Whit- 
taker & Robinson, is not correct. It is the purpose of this paper to show, 
that Zoch has not proven any error to exist in the Whittaker & Robinson deri- 
vation, but that there are a few errors in his paper. As this paper is intended 
to be read with Zoch’s paper as a reference, the terms used there will not be 
redefined here, and except where otherwise stated, the symbols used will have 
the same meaning. 


2. Zoch introduces the function 


f = + aps/pe 


and claims that it satisfies all the four axioms of Whittaker & Robinson, and 
obviously it is not the arithmetic mean. He therefore concludes that their 
derivation must have errors somewhere, and proceeds to find them. Let us 
first examine the f function. Considering only the part u3/ue, the partial 
derivatives with respect to x; are given by 

Sua{(as — 2)" — wa} — 2Qua(xs — 2) 


2 
Ne 


It is then stated (p. 172) ‘‘. . . clearly these partial derivatives are single valued 
and continuous. Therefore the function p3/y2 satisfies axiom IV.” Now, 
the condition that a function be continuous and single valued means of course 
that this be true throughout the region of definition of the function. It is not 
shown how these derivatives are clearly continuous and single valued for the 
very important case where all the x’s are equal and the derivatives become 
indeterminate. As a matter of fact they are not continuous in this case, and 
therefore the f function does not satisfy axiom IV. To prove this, we only 
have to consider the very simple case where we let 













r= k+ cz 


1This Journal Vol. VI no. 4, Dec. 1935, pp. 171-182. 
112 


POSTULATE OF ARITHMETIC MEAN 113 


where k is a fixed constant, c; is a set of arbitrary constants not all equal, and 
zisa parameter. We then have 


Z=-k+2 
2 

Mo = pez 
.. 

M3 = M3e% 


where 


é= 1/n >» Ci 
ue = 1/n 2 (cx — 2) 
us = 1/n Do (ce; — 2)? 
Substituting these values in f and the derivatives, we get taking a = 1, 
fS=kh+taet fus/2'm 
32" u2{2"(c; — é)” — zu} — 22‘ u3(c; — é) 
nz‘ ps” 





af /ax; = 1/n + 


Now going to the limit when z approaches zero, and all the z’s approach k, 
we get 


limit f=k, 
limit af/ax; = 1/n{—2 + 3(e; — @)?/u. — 2 us (ce: — @)/p2"} 
z—0 
Thus, when all the z’s approach the same value, the function f also approaches 
the same value independent of the c’s, that is regardless of the mode of approach, 
while the derivatives can take on any value depending on the c’s that is on 
how the limiting value of f is approached. The f function then does not have 
continuous single valued partial derivatives, and therefore does not satisfy 
axiom IV. 
In part 2 of the paper it is stated ‘“Now when the 2; all approach a then both 
f and af/dx; become indeterminate forms. However, in this case f takes an 
indeterminate form which can be evaluated and it can be shown that p3/pe 
will always have the value zero, i.e., f will have the value a when all the xz; — a; 
while the df/dx; can take any value whatever and in general the df/dz; will 
not be equal when the x; — a.”” This statement really amounts to saying that 
the f function does not satisfy axiom IV, but it is there used to demonstrate 
that one of Schiaparelli’s propositions is false. 


3. Having exhibited a function different from the arithmetic mean, and sup- 
posedly satisfying all the four axioms, the question is asked ‘Where is the proof 
given by Whittaker & Robinson lacking in rigor?” After numbering the 
various steps in the derivation ‘‘.. . for the sake of rigor and careful reasoning 


& seeireseess 


-_% 


1. = r 


“ 
ofl 
” 
” 
“ 
om 
*, 

a 








114 ALBERT WERTHEIMER 

.” it is stated (p. 174), “The sixth step involves the tacit assumption that 
the partial derivatives are functions of k. These partial derivatives are not 
necessarily functions of k...” and it is therefore concluded that the sixth 
step is not valid. Now, how can any function that by definition is to be evalu- 
ated at 0kx; not be a function of k? What is shown (pp. 174-5) is that 
these derivatives do not necessarily involve k explicitly, but this is neither 
implied nor necessary for the sixth step, and there is no ground for doubting 
its validity. 




































4. In order to overcome the supposed defect in the sixth step, it is proposed 
to change axiom IV so as to require the partial derivatives to be constants. 
But even then (p. 175) “... there remains an objection in the seventh step.” 
Now, the seventh step consists of the statement that if 


g(x) = Desai 
where the c’s are independent of the z’s then due to the condition that ¢ be a 
symmetric function, all the c’s must be equal. To show the defect in this 
step it is stated, that under certain conditions “. .. the function f = Z + wus3/pe 
will have partial derivatives with respect to x; which are unequal and constant; 
yet at the same time the function f is a symmetrical expression of the n vari- 
ables.”” Granting that all that is correct, what has this got to do with the 
seventh step? The f function certainly is not of the type > cia; to which 
the seventh step is applied. 





5. One more point should be mentioned. On p. 181 it is supposedly proven 
that any function satisfying the first three axioms must have continuous first 
partial derivatives. The proof is essentially as follows: Assuming all the z’s 
are given the same increment Az, the increment of the function then is Ag. 
It is then stated “. .. but by axiom I, A®é = Az. Therefore A¢/Az = 1 = d¢/dz. 
In other words, the total derivative of ¢ exists and is constant. Therefore the 
total derivative of ¢ is continuous.” From this, the continuity of the first 
partial derivatives is proven by means of Euler’s Theorem for homogeneous 
functions. Now, just what does the symbol d¢/dz (which is called the total 
derivative) mean for a function of many independent variables? Besides, 
(whatever this symbol means) is it considered rigorous to deduce a general 
Theorem from the very special case where all the differentials are made equal? 
This is one place where the f function could be used effectively as an exhibit 
of a function satisfying the first three axioms, and not having continuous partial 
derivatives. 

It is also stated (p. 181) that “‘. ..it would seem more satisfactory to postu- 
late that the function ¢ is single valued, for the single-valuedness of a derivative 
does not insure the single-valuedness of the integral while the single-valuedness 
of a function does insure the single-valuedness of the derivative where the 
derivative exists.’”’ This statement is certainly not self evident and requires 










POSTULATE OF ARITHMETIC MEAN 115 


proof. For a single variable at least, it is easy to imagine a function repre- 
sented by a curve with corners defined in a certain interval. The function then 
could be single valued everywhere in the interval, while the derivatives at the 
corners may exist and have two distinct values, depending on whether the 
corner is approached from the right or the left. On the other hand it is hard 
to imagine a curve representing a single valued function such that the integral 
i.e. the function represented by the area under the curve should not be single 
valued. 


6. In Conclusion: It is stated in the Introduction that “Since this book has 
had wide circulation, it is believed that the errors in this proof should be called 
to the attention of the users of the book. The present paper has been prepared 
for this purpose.”’ It is for the same reason, that this paper was prepared to 
show that no error has been proven to exist. 


BUREAU OF ORDNANCE, U. S. Navy DEPARTMENT 


















NOTE ON THE BINOMIAL DISTRIBUTION 
By C. E. Ciark 


The purpose of this note is to show that 


(1) f(z) = (-1)" J - ey sin wx 


q aint) 


where n is an integer = 0,0 < p< 1,p+q=1,and x*” = x(x — 1) (x —2) 
--(x — n), is a function whose values at x = 0, 1, 2, --- n are the successive 
terms of the expansion of (¢ + p)”, and also to consider the problem of fitting 
j(x) to an observed frequency distribution. 
The statement made about (1) can be verified by evaluating (1) as an inde- 
terminate form. On the other hand, (1) can be derived by observing that the 
x-th term (x an integer) of the expansion of (q + p)” is 





n! oe T(n + 1) p*q"” 
7 —_— = —-____.-—,-——---— --—~ 3 
(2) rin — a1? 4% Vie +1)r(n —2+4+1)’ 
then (1) can be derived from (2) by means of the product expansions for I(x) 
and sin zx. This derivation of (1) from (2) can also be carried out by expressing 
(2) as a Beta function and then using 











oe x (n+1) 


t T x 
B ln — 1) = ann, A oz (—])* —____. ___., 
e+he-24+) ‘ (1 + pe‘ ) (n+ 1)! sin rz 
This integration can be performed by means of the theory of residues. 
Consider the problem of fitting (1) to an observed frequency distribution. 
We shall write (1) in the form 





‘ sin 7x nb , 
(3) F(z) = ab* sit? o>, +5 + h(z — 2) 
and determine the constants a, b, n, and h so that, when Z is the mean of the 
observed distribution, F(z) will fit the distribution. 

The values of a, b, n, and Ah can be determined by the method of moments. 
Let ve, v3, and »,, denote the usual second, third, and fourth moments of the 
distribution, which are calculated in the usual way (as in W. P. Elderton, 
Frequency-Curves and Correlation) and not adjusted by any procedure such as 

2 








° . V3 V4 
Sheppard’s adjustments. Also, use the usual notation 6, = - and p, = =. 
Vo V2 






116 


2) 


ve 


le- 


he 


x) 
ng 


nw! 


NOTE ON BINOMIAL DISTRIBUTION 117 


Then, the method of moments gives 





2 
ss "= Tao k 
(5) b= 2+ mA + Veet nr) 


‘nb ( 1 ) 

h = — encisnonniennants 

/ v2 1 oh b 

! 
Z a Lasall where Zf is the sum of the frequencies of the distribution. 

a(1 + b)” 

An integer n is chosen nearest the value assigned by (4). The two values of 
b from (5) determine two curves that are congruent but whose skewnesses are 


of opposite sign. Hence, 6b is uniquely determined by (5) and the sign of the 
skewness of the data. 


For a symmetrical distribution, b = 1, v3 = 0, and 


a = (—1)" 





We shall consider an illustrative example. In the following table the columns 
f(z) and fo(z) are taken from W. P. Elderton, Frequency-Curves and Correlation 
(1906), page 62. f(z) is an empirical frequency distribution, while f2(z) is 
obtained by fitting a Pearson Type II curve to the distribution f(z). fi(z) is 
computed from 


sin 7x 


file) = 1624 —, 





x = 2.0973 + .808z 


which is determined by the method of this note. f3(z) is obtained by fitting 
the normal curve 


— (s=.4988)? 
fale) = 485.10 2(1.829) 

2 f(z) filz) f(z) f(z) 
—3 1] 18 14 19 
—2 116 107 109 92 
—] 274 281 286 263 

0 451 438 433 444 

1 432 437 433 444 

2 267 267 285 263 

3 116 106 109 92 

4 16 18 14 19 


The coefficients of goodness of fit for fi(z), fo(z), and fs(z) are respectively 
.35, .58, and .02. 













CONVEXITY PROPERTIES OF GENERALIZED MEAN VALUE 
FUNCTIONS’ 


By Nitzan Norris 


Consider the following generalized mean value functions: (1) the unit weight 


i++ --- + 2) 


or simple sample form, ¢(¢) = ( ‘, in which the 2; are posi- 


n 
tive real numbers not all equal each to each, and in which t may take any real 
t t t\l 
. CX, + Cot coe > C2, \- 
value; (2) the weighted sample form, w(t) = (= ~ oer + ot ote) 
CG +ee+ aoe + Cn 


in which the c; are positive numbers not all equal each to each, and in which the 


1 1 
x; and ¢ are restricted as in ¢(t); (8) the integral form, @(t) = il vax | 
0 


oe 

1 

t ° . ° 

where i x dx exists for every real value of t; and (4) the generalized integral 
z=0 


<0 1 
form &(t) = | i saya) f where (x) is a non-decreasing function integrable 
=0 


in the Riemann-Stieltjes sense such that y(~«) — y(0) = 1, and such that 
I z‘dy(x) exists for every real value of t. The facts that all of these func- 
z=0 


tions are monotonic increasing and that both ¢(¢) and w(t) have two horizontal 
asymptotes have been previously demonstrated.” Although the existence of 
¢(t) and w(t) has been known since 1840, there appears to have been no attempt 
made to investigate the behavior of the second derivatives of them. 

When the z; are price relatives, production relatives, or similar data, ¢(t) 
and w(t) yield common types of index numbers by direct substitution of integral 
values of ¢t. For any values of ¢ such that 0 < 4; <  < o, the type bias of 
¢(t2) will be greater than the type bias of ¢(t,). Similarly, for any values of t 
such that —« < t; < t < 0, the type bias of ¢(¢,) will be greater than the 
type bias of (tf). The second derivatives of ¢(t) and w(t) indicate whether 


1 Presented at a joint meeting of the American Mathematical Society, the Econometric 
Society, and the Institute of Mathematical Statistics at St. Louis on January 2, 1936. 
The writer is indebted to C. C. Craig, Einar Hille, Dunham Jackson, and J. Shohat for 
helpful critical reviews of the preliminary draft of this paper. 

2G. H. Hardy, J. E. Littlewood, and G. Pélya, Inequalities (Cambridge University 
Press, London, 1934), pp. 12-15; and Nilan Norris, ‘‘ Inequalities among Averages,’’ Annals 
of Mathematical Statistics, Vol. VI, No. 1, March, 1935, pp. 27-29. 

3 Jules Bienaymé, Société Philomatique de Paris, Extraits des procés-verbaux des séances 
pendant l’année 1840 (Imprimerie D’A. René et Cie., Paris, 1841), Séance du 13 juin 1840 
p. 68. 


118 





GENERALIZED MEAN VALUE FUNCTIONS 119 


type bias is changing at an increasing or a decreasing rate as between the un- 
limited number of averages available for use. Considerable interest attaches 
to w(t), the weighted sample form of function. 
Let w(t) be made arbitrary for the case of n = 2, with z; = 1, and z2 = e 
where \ is any real number. Also let c, = a, and c = 8, where a + B = 
1 


—r 
’ 


Then w(t) = [a + Be’. Now for all values of t, 


ae Br . E 2 _ Br 


a+pe“=1—Ft+ 7 oe 


e_F| 3 
2 3 


B Bl» 
5- $e - 


For | ¢ | sufficiently small, it follows that 


log (a + Be) = — Bt + 38x (1 — Be? + Br*| — ; 


at. 
— 


so that fort + 0 


log (a + Be) = — Br + ¥en (1 — Be + ar*| 5 


Therefore w(t) = exp. E log (a + ae) | 

' | ian ie. \ 
ee 1 p>? i Ey ee ee = in 2 “on 
=e [i+ 4a (1 e+e gt 9 — BYP + | 


2 
It follows that w’(0) = 2arte™| — ; +f- f 2 = (1 — ays | It is clear 


that w(0) is the weighted geometric mean, and that a: is the unit weight or 
simple sample form of geometric mean. As a means of demonstrating the range 
of values which w’’(0) may take it is helpful to rewrite the expression for w’’(0) 
as follows: 


— 9 4 1— 26 
” _ 3 4t — R)*,3 aa i = 
w"(0) = E81 — ayn’ | — S78 we Fr, 0, 
This consideration makes it possible to distinguish three cases of y = f(A, 8) 
for fixed 8, namely, 0 < B < 3; 8 = 3; and 3 < 6B <1. Im all three cases 
f(, 8) has an absolute minimum u(8) S 0, and ” = 0. The ~~ 
— 58 — 8B aie 


saa =a) + RIND 

It is clear that by taking 8 near enough to 0, one can make u (8) as large negative 
as is desired. Also, by choosing \ properly, one can make w’’(0) take any 
value between u(8) and o. For example, when a = 8 = 3, \ may be selected 


so as to make w’’(0) any arbitrarily chosen non-negative number. For then 
4 


w(0) = 64 e *, and as ) increases from — ~ to 0, w’’(0) decreases from to 


0. If\X =0,0"(0) = 0. If > 0, asd increases from 0 to 8, w’’(0) increases to 


values of satisfies the quadratic equation \” — 








120 NILAN NORRIS 


64‘, and as ) increases beyond 8, w’’(0) decreases, approaching 0 as increases 

indefinitely. It is evident that the case of a = 6 = 3, withA = —log 2, 2, = 1, 

and 22 = e, is one in which w(t) becomes the unit weight or simple sample 
t rm, i 

type of generalized mean value function, namely, ¢(t) = —5 “y Reference 


to the first expression above noted for w’’(0) will make clear that ¢’(0) = 


4 
— 4/2 in this special case. 


Analysis of (¢), the generalized integral form of generalized mean value 
function, makes it possible to characterize populations of a very general char- 
acter, as well as samples. But in the case of #(t) it is even more difficult to 
generalize as to convexity properties. For example, let 


&(t) = | ce" az 


u 


E(u) = Ti [oe - * dv. 


This expression is obviously of the required generalized integral type. Now 


t 7 —ut—u2 1 . - os )’ e 
[e(t)]’ = = x du = oe e4 du = e4 
7 1 use 


t 


where 






















t 4 
Therefore $(¢) = e*, and ®’’(t) = i > 0 for all ¢. That is, in this particular 


case, &(t) has only one horizontal asymptote. 

The foregoing examples indicate that the following conclusions may be drawn 
as to the diverse convexity attributes of the various means as functions of t: 
(1) The unit weight form, ¢(¢), and the weighted sample form, w(t), must always 
have a point of inflection, since both of them not only increase with t, but are 
doubly asymptotic (have two horizontal asymptotes). (2) Points of inflection 
for ¢(t) and w(t) do not necessarily occur att = 0. (3) The generalized integral 
form, #(t), need not always have a point of inflection. That is, the second 


dulvenie es of certain forms of #(t) do not change their sign, since such forms 
are concave upward. 


UNIVERSITY OF MICHIGAN. 















A SIMPLE FORM OF PERIODOGRAM 
By DinsmMorE ALTER 


Schuster’s introduction of a method of systematic search for hidden periodici- 
ties and cycles opened a new field for the investigator of statistical data. The 
beauty of his method in its analogy to analysis of light, and the great reputa- 
tion of its author, combined to give it universal acceptance and to blind statis- 
ticians to its faults. 

In more recent years at least three new mathematical and two mechanical 
forms of periodogram analysis have been proposed, each of which exhibits 
certain advantages over the original one. The use of the term periodogram 
for these forms is an extension of Schuster’s original definition which used as 
abscissae quantities proportional to the squares of the amplitudes of the sine 
terms found in the data for the various trial periods. He wrote: “It is con- 
venient to have a word for some representation of a variable quantity which 
shall correspond to the spectrum of a luminous radiation. I propose the word 
periodogram and define it more particularly in the following way: 


t;+T t1+T 
Let 37a = f f(t) cos ktdt and 3Tb = [ f(t) sin ktdt 
ti 


1 


2r 
ke ’ 
and plot a curve with = as abscissae and r = +/a? + b? as ordinates; this curve, 
or better, the space between this curve and the axis of abscissae, represents the 
periodogram of f(t).” 
The following appear to be the essential criteria for a satisfactory form of 
periodogram: 

1. It must exhibit plainly any repetition of form in the data regardless of 
how irregular the shape of the repeated interval may be. In doing this it 
must exaggerate the amplitude of the main terms at the expense of the 
lesser ones. 

2. The calculation of the indices must be short. In a periodogram from 
many data the indices sometimes are computed for several hundred trial 
periods. 

3. There should be a geometrical interpretation of the index used. 

4. The frequency distribution of the index must be known. 

5. Combining or smoothing the data should modify the index in a manner 
which leaves an obvious interpretation. 

121 


where 7’ may for convenience be chosen equal to some integer multiple of 
















122 DINSMORE ALTER 





The Schuster periodogram has the following disadvantages: 

1. Only sine terms of large amplitude are exhibited. <A perfect repetition 
of an extremely irregular form of data would not be indicated in any way. 

2. The calculations are long. 

3. There is a considerable uncertainty in the length of the period found. 

Those methods of analysis which use harmonics as well as the fundamental 

have much less of this uncertainty. 

The correlation periodogram has advantages in each of these points over the 
Schuster. However, even with it the calculations are fairly long. Further- 
more, the modification of the coefficient introduced by grouping or smoothing 
is not a linear one. 

The periodogram described here is a slight modification of one for which a 
preliminary note was published in 1933. Additional features have been studied 
and its applications to many data have shown its ease of calculation. This 
calculation has been reduced still more by a mechanical method which renders 
it practicable to contemplate the possibility of studying many data hitherto 
prohibited by excessive cost. 

Consider data x , 41, %2, °** Zi, °** Lm—1). Let l be any integer less than n. 
Form the sum of the absolute values of x; — x;:_», designated by > | 25 — tap}. 


n—l 


Define A = >» et — Sa | , | takes the values of the various trial periods and 
i=l mn 


is called the lag. A, therefore, is the mean error between prediction that data 
will be repeated after a lag of J and the fulfillment of the prediction. Such 
an index has a meaning that is immediately of use to a meteorologist or other 
investigator. Coefficients such as the Schuster and the correlation coefficient, 
although valuable statistically, are of less immediate interest. 

The standard deviation of these errors of prediction follows at once from 
standard formulae under assumption of normal distribution. 


¢6= 125A 


The distribution of ¢, as computed from the absolute values of data, has 
been studied by Helmert and by Fisher. Davies and E. 8. Pearson have com- 
pared the various methods of estimating ¢. For the large number, (n — J), 
pairs of data used for a periodogram point, this method becomes almost as 
precise as the usual one which would square the values of (x; — a;:-,). For 
(n — l) as small as 50, the standard deviation of the standard deviation by this 


method is only seven percent larger than by the other one. Fisher has shown 
that 








ja ~+ 3 


Te a er — as (n — 1) > & 


This may be written as 


1.068 ¢ 
ap OSS onenee 
°¢ * 4/2(n — 1) 





SIMPLE FORM OF PERIODOGRAM 123 


The distribution approaches normal rapidly and for all values of (n — JI) that 
would be used in periodogram calculation certainly may be considered as normal. 
It will be very seldom that a value of (n — 1) much smaller than 200 will be 
used. 

The data may be printed on two strips of adding machine tape held together 
by clips so as to match data separated by a lagl. In arranging them for investi- 
gation, it usually is most convenient to make all numbers positive. The 
computer subtracts mentally and puts the difference into an adding machine, 
which gives him A almost immediately. 


For some computers, and especially where the numbers are large, another 
method of obtaining A may save time or lead to less numerical mistakes. The 
computer will form the sum of all his data. He will, as for the other form of 
computation, put these on two pieces of adding machine tape that he lays side 
by side. However, instead of putting the difference of the pairs into the ma- 
chine, he will, in each case, put in the smaller datum of the pair. Then, 


(n — l)A, = 2 z all data — (> Ist (n — I) + > last (n — 1) data] 
—2 z smaller 


The derivation of this equation is obvious. In computing by this method the 
subtotaler on the machine can be used to make the strip of sums of the first 
(n — 1) data and of the last (n — J) for all values of 1. The first term on the 
right hand side is a constant, the last is twice the sum of the smaller numbers 
chosen in the pairs. I have computed by both methods, and where the numbers 
are small, I prefer the former. Where they are large, I prefer the latter. How- 
ever, When one must use comparatively untrained computers, he will find less 
mistakes made if the computer does not make the subtractions. 

The calculation of A is much shorter than that for the indices even of the 
correlation and variance periodograms. It may, however, be shortened even 
more by a mechanical arrangement. (n — 1)A, is the area between two histo- 
grams of the data matched after a lag 1. These may be carefully graphed on a 
large seale and two such graphs superposed over a table with a translucent 
illuminated top. On the edge of this table is the track to guide a rolling pla- 
nimeter. A, as computed by this means, is accurate to approximately one-half 
of one percent of its value, a much more exact value than is needed. The 
details of such a device as constructed for the Griffith Observatory are shown by 
the accompanying photograph and diagram. The dual saving of time by the 
method and by its mechanical application have resulted in the adoption of a 
much more ambitious program of meteorological research than previously was 
contemplated. 











DINSMORE ALTER 














. “ ° 
dCi tan Geass ~ 7? 





PLammeEree TABLE 


ORFF Tm OBBEvATORY 
FFT PORE LOS ANGELES 


‘ALE DIAGRAM OF PLANIMETER DEVICE 


PLANIMETER DEVICE FOR MECHANICAL CALCULATION 





SIMPLE FORM OF PERIODOGRAM 125 


The form taken by the periodogram is important. Consider the simplest 
ease, data which follow a sine curve. 


(= *) 
y; = acos 
p 


. wl} . 2r(fl—i)+ec 
Yi — Wit = 2a sin a —— + 
p p 
The term in brackets takes values distributed around the circle and the part 
outside is a constant for any one lag. The bracket term sums approximately to 


2n—l) . , 
a , since we consider all terms as of one sign only. 
7 
4a . al 
A, = sin 
T p 
E fo 1 echo eiey 
or TRIAL PERIODS IN MONTHS it} 
eh! @ 90. 100 180 t00 280 300 Bec 400 Ae 800 i. 
@00 
50 } 
: | 
900 I 





080 


"00 


NORTHERN CALIFORNIA RAINFALL PERIODOGRAM 
HH | 


t 


If the absolute values were not considered in the expression for 47, the periodo- 
gram would be a sine curve of period 2p. The lack of sign gives a cusp curve 
with the cusp at lags p, 2p, ete. Such a form is advantageous in that the 
periodogram gives sharp peaks at multiples of the periods which may exist. 

The effect of the periodogram in exaggerating the principal terms at the 
expense of the smaller ones may be obtained most easily by equating o as 
obtained by the linear and the quadratic formulae. 








126 DINSMORE ALTER 


The data may be written as the sum of cosine terms 


l— a Zz — ) 
Yi = acos (= *) +b cos( = “) de sss eg, 
Pa Pb 


. . 2 1] —s 1 Da 
Yi — Yi. = 2asin wl sin an( ; +? |+ -++ 4+ (¢; —e;_;) 
Pa Pa 


>» (yi — yi_r)” = 2(n — Da’ sin” . + 2(n — I)b’ sin” “ +---4+(n—l) VW2¢ 


a Pb 


oni i > © « > . . . 
The sine terms contribute to Az in proportion to the squares of their ampli- 


. - 2 al . ° ° 
tudes. On account of the sin” — factor, they contribute very little to values 
Pj 


of A, for which # is not very closely an even multiple of 7. 
Pi 

This method has been applied to rainfall data of the Pacific Coast and has 
proved as satisfactory in practice as would be expected from the simplicity 
of the theory. The periodogram of rainfall stations along the northern third 
of the California coast is shown here, exhibiting perhaps the most definite 
single piece of evidence ever found for rainfall cycles. Outstanding is a cycle 
of about 45 years with its fourth harmonic as the secondary feature. The 
writer expects to publish the results of that work in the Monthly Weather 
Review. 








